# Multimodal fine-tuning
Clip Flant5 Xl
Apache-2.0
A visual-language generation model fine-tuned for image-text retrieval tasks, improved based on google/flan-t5-xl
Text-to-Image
Transformers English

C
zhiqiulin
13.44k
2
Clip Flant5 Xxl
Apache-2.0
A vision-language generation model fine-tuned based on google/flan-t5-xxl, specifically designed for image-text retrieval tasks
Image-to-Text
Transformers English

C
zhiqiulin
86.23k
2
Vit Base Patch16 224 In21k Gpt2 Finetuned To Pokemon Descriptions
A vision-language model based on ViT and GPT2 architectures, specifically fine-tuned for Pokémon description generation tasks
Text Generation
Transformers

V
tkarr
29
0
Bert Hateful Memes Expanded
Apache-2.0
A model fine-tuned based on bert-base-uncased for identifying hateful meme text content
Text Classification
Transformers

B
limjiayi
29
4
Featured Recommended AI Models